GH-140638: Add a GC "candidates" stat #141814

brandtbucher · 2025-11-21T06:21:12Z

This exposes a counter for the number of unique objects seen in the current generation. Previously, something like sum(len(gc.get_objects(g)) for g in range(gen + 1)) was needed to get the a statistic like this (in this PR I've made the choice to collect this statistic in update_refs, and omit immortal but include marked members of the current generation).

It's useful for things like info["collected"] / info["candidates"], to get the efficiency of a collection.

Issue: Add duration and visited to the gc.callbacks and gc.get_stats() dicts #140638

📚 Documentation preview 📚: https://cpython-previews--141814.org.readthedocs.build/

pablogsal

LGTM! 🚀

I assumed that visited is not initialized because IIRC the interpreter struct is calloc-ed. I'm on my phone so is not easy to double check but just a thing to check before landing

pablogsal · 2025-11-21T16:23:00Z

Python/gc.c

 add_stats(GCState *gcstate, int gen, struct gc_collection_stats *stats)
 {
    gcstate->generation_stats[gen].duration += stats->duration;
+    gcstate->generation_stats[gen].visited += stats->visited;


I don't think this matters but what would happen if this overflows?

Are we generally worried about overflowing Py_ssize_t?

I can switch this to size_t if we are so at least the overflow isn't UB. A Python int seems sort of heavy for this, but we could go that route if needed too.

Are we generally worried about overflowing Py_ssize_t?

Nah, I think this has the same problem as the other fields just wanted to raise it here in case we both believe is worth fixing. I don't think is important, but it is possible. WDYT? I am fine to ignore just to be clear

I'll just check for overflow here and cap the value at PY_SSIZE_T_MAX to be paranoid about UB.

Actually, I'm not worred about this (the fix looks like more trouble than it's worth). If we feel like this is a problem we can just make these fields unsigned in another PR.

pablogsal · 2025-11-21T16:24:37Z

Doc/library/gc.rst

      "uncollectable": When *phase* is "stop", the number of objects
      that could not be collected and were put in :data:`garbage`.

+      "visited": When *phase* is "stop", the number of unique objects visited


I may be missing something but if I understand this correctly this counts objects in the collection set, not all objects visited during traversal (which would include objects referenced by containers). Consider maybe renaming to "objects_in_collection" or "initial_objects" or updating implementation to count all visited objects (including referenced ones during subtract_refs)

Yeah, @nascheme pointed this out too on the issue. I think naming is hard... this can really be any of:

Total visits performed (edges in the graph).

Total unique objects visited (nodes in the graph, optionally minus immortals, marked-alive objects, untracked objects, etc.).

Total objects in the current generation (nodes in the generation subgraph, optionally minus immortals, marked-alive objects, untracked objects, etc.).

I'm really trying not to overthink it... I think the probably most intuitive answer is something along the lines of "How many unique objects did the GC consider before eventually freeing <collected> number of them"? Like I said above, collected / visited should be a meaningful efficiency metric to help see at a glance if you're running at the right times (in addition to memory metrics, of course).

But definitely open to suggestions. I naturally lean towards a simpler API name and more nuance in the docs for it, rather than trying to convey all of the nuance in one long name alone.

Put another way, it seems like objects that can't count towards collected also shouldn't count towards visited. So I'm pretty sure that would exclude anything reachable from, but not part of, the current generation?

I agree but even keeping the "naming is hard" optics, I would say that anything other than visited is good because at least we can agree that surely this is not "number of visited objects" no? Maybe candidates?

Sure, let's do "candidates"!

if "objects_marked" is not a significant fraction of "candidates", there could be something wrong.

Clarification about this: that makes more sense when looking at the free-threaded GC. For the default build GC, the mark-alive operation is done as a separate step on the incremental collection. In that step, candidates == objects_marked. In other incremental steps, objects_marked == 0. If we change gc_collect_full() to run a mark-alive operation first, comparing those numbers will be useful for the default build GC too.

Yeah, after thinking about it more I agree marked-alive should be included.

For the default build GC, I think you can just add the objects_marked count to the candidates total.

I think this could overcount, if we mark objects that aren't part of the current generation? I'll look into it a bit more.

I don't think this is useful and we should change it to not invoke the callback in that case.

Eh, I'm not sure I like this. I think it's reasonable to expect callbacks to be called exactly twice, and can easily imagine scenarios where a "start" call with no corresponding "stop" call could really screw things up (for example, a callback that sets gc.set_debug(gc.DEBUG_SAVEALL) on some tiny fraction of collections to log common sources of cycles).

Finally, perhaps we should include the "objects_marked" count in the info dict as well.

Sure, I can open another PR for this after (unless you wanted to take it). While we're at it, "untracked" seems useful, too.

Hm, looking at the default GC implementation, I agree it's not super intuitive how to attribute marked-alive objects to individual increments.

Probably for the mark phase, we just populate candidates in the "stop" info, and ignore the marked objects in other collections. That way the aggregate stats still make sense... maybe this is what you were suggesting and I just misunderstood?

Okay, this is what I did (for the marking phase, dump the count of objects marked into the candidates stat).

I think it's reasonable to expect callbacks to be called exactly twice

Yeah, I think that's more important than avoiding "useless" callbacks where work_to_do < 0. I was only looking a the phase == 'stop' calls.

brandtbucher · 2025-11-21T16:29:15Z

I assumed that visited is not initialized because IIRC the interpreter struct is calloc-ed. I'm on my phone so is not easy to double check but just a thing to check before landing

Yeah, per-collection stats have an initializer here (here in free-threading), so unnamed members are zeroed. The process-wide stats live on the gcstate, which is part of the interpreter state.

brandtbucher · 2025-11-22T19:01:19Z

My choice of wording for the docs:

candidates is the total number of objects in this generation which were considered for collection and traversed.

brandtbucher added 6 commits November 19, 2025 11:35

Add a "visited" stat.

bb3fc94

Count visited in update_refs

d89ecdc

Remove old void cast

3ee8db9

Update docs and tests

41814f3

blurb add

8f4ce45

Omit immortal and live objects

13a97f3

brandtbucher requested a review from nascheme November 21, 2025 06:21

brandtbucher self-assigned this Nov 21, 2025

brandtbucher added the type-feature A feature request or enhancement label Nov 21, 2025

brandtbucher requested a review from pablogsal as a code owner November 21, 2025 06:21

brandtbucher added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 21, 2025

bedevere-app bot added the awaiting core review label Nov 21, 2025

bedevere-app bot mentioned this pull request Nov 21, 2025

Add duration and visited to the gc.callbacks and gc.get_stats() dicts #140638

Open

brandtbucher added 3 commits November 20, 2025 22:32

Fix free-threading errors (still need to get to warnings)

bdfe660

Fix bad address

2d25167

Clean up the diff

9be8100

pablogsal approved these changes Nov 21, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Nov 21, 2025

pablogsal reviewed Nov 21, 2025

View reviewed changes

brandtbucher added 2 commits November 22, 2025 10:53

visited -> candidates

7f46627

Account for marked objects in candidates.

866bd56

brandtbucher changed the title ~~GH-140638: Add a GC "visited" stat~~ GH-140638: Add a GC "candidates" stat Nov 22, 2025

Wording

7e65738

brandtbucher enabled auto-merge (squash) November 22, 2025 21:34

brandtbucher merged commit 227b9d3 into python:main Nov 22, 2025
46 checks passed

bedevere-app bot removed the awaiting merge label Nov 22, 2025

Uh oh!

GH-140638: Add a GC "candidates" stat #141814

GH-140638: Add a GC "candidates" stat #141814

Conversation

brandtbucher commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pablogsal left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pablogsal Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandtbucher Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nascheme Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandtbucher commented Nov 21, 2025

Uh oh!

brandtbucher commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

brandtbucher commented Nov 21, 2025 •

edited

Loading

pablogsal Nov 21, 2025 •

edited

Loading

brandtbucher Nov 22, 2025 •

edited

Loading

nascheme Nov 22, 2025 •

edited

Loading

brandtbucher commented Nov 22, 2025 •

edited

Loading